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1. ABSTRACT 

The life cycle of Severe Acute Respiratory the outbreak of the fatal atypical pneumonia worldwide 
Syndrome Coronavirus (SARS-CoV) involves a unique in 2002-2003 (1-3). Phylogenetic analysis initially 
process called discontinuous transcription by which a showed that SARS-CoV did not belong to any group 
set of 3’ coterminal subgenomic mRNAs (sgmRNA) of known coronaviruses, but could be classified as 
with identical 5’ leader sequences can be generated. group 2b virus based on the conservation of the 
The current study demonstrates that the replication carboxyl terminal domain of spike protein (4). Due 
intermediates of minus strand of subgenomic RNA to the recent expansion of novel coronaviruses, 
(SgRNA) can be readily recovered from SARS-CoV the viruses have been classified into 4 genera 
infected cells. A novel sgmRNA (M-1) was identified as (alphacoronavirus, betacoronavirus, detacoronavirus 
a short version of membrane (M) gene. Transcriptional and gammacoronavirus) based on_ phylogenetic 
regulatory sequences (TRS) of SARS-CoV and Mouse analysis and the conserved sequences of the non 
Hepatitis Virus (MHV) sgmRNAs contain a species structural proteins, while SARS-CoV was re-grouped 
specific core element (CE). The sizes of leader as betacoronavirus (5). 
sequences in MHVs vary not only in different viral strains 
but also among different genes in the same strain. Leader SARS-CoV is a single-stranded positive RNA 
alterations such as deletion and nucleotide substitution virus with a genome size around 29,700 nucleotides 
were observed in MHVs, while a dynamic one-orientation (nt), and presents a typical coronavirus genomic 
“sequential deletion” was found among the leaders of 76 organization (6, 7) that encodes for at least 14 open 
SARS-CoV isolates. These results imply that the leader reading frames (ORFs). The first two thirds of SARS- 
sequence of coronavirus might be unstable and leader CoV genome produces two large overlapped ORFs 
alterations during SARS-CoV transmission in humans (ORF 1a and ORF 1b). The other 12 ORFs are 
might have negative impact on its viral infectivity. generated from the rest one third of viral genome. 

After infecting the cells via membrane fusion and 
2. INTRODUCTION endocytosis, SARS-CoV releases its genetic content 
into the cytoplasm and immediately synthesizes the 

Severe Acute Respiratory Syndrome (SARS) two large polyproteins pp1a and pp1ab from ORF 1a 

coronavirus is the causing pathogen responsible for and ORF 1b involving a process called programmed 
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-1 ribosomal frame-shift (8). The polyproteins ppia 
and ppiab are subsequently subjected to proteolytic 
processing for the generation of 16 nonstructural 
proteins that some of them are essential for viral 
genome replication and viral RNA synthesis (9). The 
ORFs from 5’ to 3’ direction located at the 3’ 1/3 of 
SARS-CoV genome encode four major structural 
genes S (ORF 2), E (ORF 4), M (ORF 5) and N 
(ORF 9). The interspaced are the rest of ORFs that 
encode the other 8 accessory proteins which are 
marked as group specific gene products since they 
share almost no sequence homology with other groups 
of coronaviruses (10). Two ORFs, ORF3a and ORF3b, 
are located between S and E genes, ORF 6, 7a, 7b, 
8a and 8b are between M and N genes (10-12), while 
N gene also contains an internal ORF 9b protein to 
shuttle between cytoplasm and nucleus (13, 14) and to 
induce anti innate immune response (15). 


Upon the production of viral replicase 
proteins, SARS-CoV undergoes the second stage of 
viral infection by initiating viral genomic transcription 
and replication for the generation of the positive 
genome sized RNA as well as a nested set of 3’ 
coterrminal subgenomic mRNAs (sgmRNA) with an 
identical 5’ leader sequence via a unique mechanism 
called discontinuous transcription (16, 17). A double 
membrane vesicle (DMV) formed in SARS-CoV 
infected cells is believed to be the site of viral replication 
and transcription (18). A replication and transcription 
complex (RTC) containing viral replicase-transcriptase 
proteins plus other viral proteins as well as cellular 
proteins is associated with DMV in which both genome 
size and subgenomic RNA are synthesized (16, 19). 
For synthesizing the minus stranded subgenomic 
RNA, RTC processes along the positive RNA genome 
template from its 3’ end, and then either reads the body 
TRS (TRS-B) upstream of each ORF as an attenuation 
signal and relocate the nascent RNA to copy the 
genomic leader sequence by recognizing the 3’ leader 
TRS (TRS-L), or continuously transcribes through 
to meet next TRS-B signal. The production of minus 
stranded RNA will serve as replication templates for 
the subsequent synthesis of MRNA. 


Nucleotide substitution, insertions and 
sequence deletions were frequently observed in 
different isolates of SARS-CoV (20-25). We speculate 
that all these changes may result from the adaption 
of SARS-CoV in human during disease transmission. 
Some sequence alternations have been shown to 
occur increasingly in SARS patients (19) and may affect 
viral infectivity (24). In addition, the strain variations in 
mouse hepatitis viruses (MHVs), a betacoronavirus, 
may also influence viral infectivity and pathogenesis 
(26). For example, a serine-to-glycine change at 
position 310 of the S protein may account for the more 
neurovirulent phenotype in MHV-JHM strain (27). A 
strain variant isolated from persistently infected mouse 
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model obtained increased virulence and infectivity. 
Subsequent sequencing analysis on this viral strain 
uncovered a 29 amino acid deletion at S protein that 
may account for the loss of its cytotoxic T cell epitope, 
indicating that MHV virus adaptive mutation may be 
important for viral immune escape (28). 


In this study, the leader sequences of both 
SARS-CoV and MHV were comparatively analyzed. 
The different patterns of the strain-specific leader 
variations among the isolates of both SARS-CoV 
and MHV were apparent. Sequence alignment on the 
genomic leader sequences derived from 76 isolates 
of complete SARS-CoV genomes demonstrated a 
“sequential deletion” process, while the alterations of 
the leader sequences derived from MHV isolates most 
often presented with single nucleotide substitution 
and short sequence deletions. Thus, the current study 
indicates that the leader sequences of coronaviruses 
might be unstable during epidemic and disease 
transmission. 


3. MATERIALS AND METHODS 


3.1. Cell culture, SARS-CoV strain and 
viral infection 


Vero E6 cells were cultured in Dulbecco’s 
modified Eagle’s medium (DMEM) supplemented with 
10% heat-inactivated fetal bovine serum supplemented 
with 5% CO, at 37°C. SARS-CoV strain HKU-39849 
(29) (provided by Dr. KY Yuen, The University of Hong 
Kong) was propagated in Vero E6 cells in P3 laboratory. 
The virus was released from the infected cells by 
three cycles of freezing and thawing. 2x10° cells were 
plated onto 60 mm dish and infected with 110° MOI 
of SARS-CoV (HKU-39849) in DMEM medium without 
FBS for 2 hours at 37°C. Cells were washed once with 
phosphate buffered saline (1xPBS, pH 7.4.), added 
4 ml of complete medium, and incubated for another 
24 hours at 37°C before harvesting. All operations 
were performed in a bio-safety P3 laboratory. 


3.2. The leader sequence information of 
SARS-CoV and MHV 


The genomic leader sequences of MHV JHM, 
MHV JHM.IA, MHV-A59, MHV S, MHV-1, MHV-2, and 
MHV-3 were deduced from their complete genomic 
cDNA sequences with GenBank accession numbers 
NC_006852, FJ647226, NC_001846, GU593319, 
FJ647223, AF201929 and FJ647224, respectively. 
The GenBank accession numbers for the leader 
containing sgmRNAs S, M, and N of MHV JHM were 
X04797, X04223.1, and X00990.1, respectively, and 
for the leader containing sgmRNAs S, M and N of 
MHV A59 were M18379.1, M25906 and M25907.1, 
respectively. The deduced genomic leader sequences 
from 76 strains of SARS-CoV were isolated from 
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GenBank database according to the complete genomic 
sequence information of the selected viral strains with 
the GenBank accession numbers listed below: GZ02, 
AY390556; HZS2-Bb, AY395004; ZS-C, AY395003; 
CUHK-LC5, AY395002; CUHK-LC4, AY395001; 
CUHK-LC3, AY395000; CUHK-LC2, AY394999; ZS- 
A, AY394997; ZS-B, AY394996; HSZ-Cc, AY394995; 
HSZ-Bc, AY394994; HGZ8L2, AY394993; HZS2-C, 
AY394992; HZS2-Fc, AY394991; HZS2-E, AY394990; 
HZS2-D, AY394989; JMD, AY394988; HZS2-Fb, 
AY394987; HSZ-Cb, AY394986; TW3, AY502926; 
BJ04, AY279354; HGZ8L1-A, AY394981; HGZ8L1-B, 
AY394982; ZS-C, AY395003; HSZ2-A, AY394983; GZ- 
C, AY394979; Tor2, NC_004718; BJ01, AY539954; 
WHU, AY394850; NS-1, AY508724; TW10, AY502923; 
TW2, AY502925; ShanghaiQXC1, AY463059; ZJ01, 
AY286320; ShanghaiQXC2, AY463060; GD69, 
AY313906; FRA, AY310120; SoD, AY461660; Sino1— 
11, AY485277; CUHK-AG03, AY345988; CUHK- 
AG02, AY345987; CUHK-AG01, AY345986; CUHK- 
Su10, AY282752; PUMC03, AY357076; PUMCO2, 
AY357075; PUMC01, AY350750; GZ50, AY304495; 
SZ16, AY304488; SZ3, AY304486; AS, AY427439; 
HSR 1, AY323977; Sin2774, AY283798; HKU-39849, 
AY278491; GD01, AY278489; TWC2, AY362698; 
Sin2748, AY283797; Sin2679, AY283796; Urbani, 
AY278741; ZMY 1, AY351680; TWY, AP006561; TWS, 
AP006560; CUHK-W1, AY278554; TC3, AY348314; 
TC2, AY338175; TC1, AY338174; TWC, AY321118; 
Frankfurt 1, AY291315; Sino3—11, AY485278; BJO3, 
AY278490; BJ02, AY278487; ZJ01, AY297028; TW1, 
AY291451. 


3.3. Primer designs and sequence information 


The most 5’ end of the complete genome 
of SARS-CoV strain HKU-39849 was served 
as the template for the design of the common 
forward primer, while reversed primers were 
deduced from the 3’ genomic sequences of the 
structural genes (S, M, E, and N). The common 
forward primer was _ 5’-atattaggtttttacctacc-3’. 
The reversed primers for the structural 
protein genes were 5’-ccatgcatagacagaagggaa-3’ 
(S), 5’-ttactgtactagcaaagcaa-3’ (M), 
5’-ttagaccagaagatcagga-3’ (E), and 
5’-gctattaaaatcacatgggga-3’ (N). The amplifications 
of different structural genes of SARS-CoV were 
conducted by RT-PCR. The reaction products were 
subcloned into pGEM-T-easy Vector and subsequently 
sequenced. 


3.4. RNA isolation and RT-PCR analysis 


Total RNAs from 5 x 10® SARS-CoV infected 
cells were extracted with 1 ml Trizol (Invitrogen) plus 
200 ul chloroform. The supernatant was precipitated 
with an equal volume of isoproponal. Then, RNA 
pellet was resuspended in sterile water. About 1—2 
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mg RNAs was used as templates for one-step RT- 
PCR analysis (Takara Biotechnology) as described in 
the manual. The reaction mixture was first incubated 
at 50°C for 30 minutes for reverse transcription, 
and then denatured at 94°C for 2 minutes before 
PCR cycles. Thirty cycles of PCR was conducted at 
following condition: 94°C for 30 seconds, 55°C for 
30 seconds, and 72°C for 2 minutes. The reaction 
products were extended at 72°C for 10 minutes 
before storing at 4°C. 


4. RESULTS 


4.1. Identification of one novel sgRNA from 
HKU-39849 strain of SARS-CoV 


The life cycle of coronavirus involves a 
unique discontinuous transcription with the production 
of the replication intermediates, the subgenomic 
minus strand RNAs. To identify these replication 
intermediates, specific 5’ primers were designed to 
amplify the sgRNAs of SARS-CoV HKU-39849 strain 
for S, N, M and E genes, respectively (Figure 1A and 
1B). RT-PCR and sequencing analysis showed that 
the replication intermediates of minus strand RNAs 
were indeed produced for S, N, M, 3a/3b and E genes 
(Figure 2). Unexpectedly, one novel sgRNA (named 
M-1) with minus strand of replication intermediate 
could also be detected (Figure 1A and 1B). Sequencing 
analysis reveals that M-1 mRNA is a short version of M 
sgmRNA that only contains the last 145nt of M coding 
sequence (Figure 1C). 


4.2. Analysis of the leader and TRS sequences of 
sgmRNAs derived from both SARS-CoV and MHV 


The leader-body fusion sites of SARS- 
CoV and MHV were uncovered by sequence 
comparison between the 5’ end of sgmRNA and 
its complete genomic RNA. The fusion sites for S, 
N, M, M-1, 3a/3b and E genes of SARS-CoV were 
CUAAACGAAC, UAAACGAAC, UCUAAACGAAC, 
ACGAAC, UAAACGAACUU and ACGAACUU 
(underline indicates the core element (CE)), 
respectively (Figure 2). The fusion sizes varied from 
6 to 11 nucleotides (nt) which resulted in the variants 
of leaders from 72 to 74nt (Figure 2). For MHVs, the 
size of leader sequence varies not only in different 
strains but also among different genes in the same 
strain (Figure 3). The TRS sequences of MHVs were 
conserved for the same genes in different strains, 
while the sizes of TRS apparently varied (Figure 3). 
For example, the TRS for S gene is composed of 
10 nt with a single pentanucleotides (UCUAA) core 
element, while the TRS sequences for both M and N 
genes contained a tandem repeat of CE (Figure 3). 
Moreover, the sizes of leader sequences varied from 
76nt to 80nt (Figure 5A) and from 71nt to 80nt for MHV 
JHM and MHV A59 (Figure 3), respectively. 
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Figure 1. Evidence for the presence of both positive and negative subgenomic RNAs during SARS-CoV infection. Total RNAs were isolated from SARS- 
CoV infected Vero E6 cells after 48 hr post-infection. For the detection of either positive (A) or negative sgRNAs (B), the first stranded cDNAs were 
synthesized independently by using oligo dT or anti leader (P_ ) as primer, respectively, and subsequently PCR reactions were amplified using S, M, N and 
E specific primers. The suspected bands (white arrows) were isolated from agarose gel and subjected to sequencing analysis. (C) Sequence alignment 
on M sgmRNA, M-1 sgmRNA and leader sequence by ClutalW. Box indicated the sgmRNA conserved TRS core element. 
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HKU leader = aUAUUAGGUUUUUACCUACCCAGGAAAAGCCAACCAACCUCGAUCUCUUGUAGAUCUGUUG JUUAA 12nt 73nt 
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M-1 genome ATGGCCGGACACTCCCTAGGGCGCTGTGACATTAAGGACCTGCCAAAAGAGATCACTGTGGCTACATC, UU-15int-AUG 
HKU leader AUAUUAGGUUUUUACCUACCCAGGAAAAGCCAACCAACCUCGAUCUCUUGUAGAUCUGUUCUCUA: lUUUAA 6nt 72nt 
M-1 sgmRNA AUAUUAGGUUUUUACCUACCCAGGAAAAGCCAACCAACCUCGAUCUCUUGUAGAUCUGUUCUCUA: GCUU-15Int-AUG 
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Egenome UGGAUCCCAUUUAUGAUGAGCCGACGACGACUACUAGCGUGCCUUUGUAAGCACAAGAAAGUGAG! AUG 
HKU leader = AVAUUAGGUUUUUACCUACCCAGGAAAAGCCAACCAACCUCGAUCUCUUGUAGAUCUGUUCUCUAA AA 8nt 74nt 


EsgmRNA AUAUUAGGUUUUUACCUACCCAGGAAAAGCCAACCAACCUCGAUCUCUUGUAGAUCUGUUCUCUAS JUAUG 


Figure 2. The heterogeneity of the leader sequences in different sgmRNAs of SARS-CoV. Each sequenced sgmRNA such as S, N, M, M-1, 3a/3b or E 
was aligned with both the leader sequence and its respective genome sequence by ClutalW program. Underlines represent the identified TRS elements 
overlapped among viral genome sequence, 5’ leader sequence and the sgmRNA. Box indicates the TRS core element (CE). The sizes of both leader 
sequence and TRS are indicated at the right panel. 
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Figure 3. The heterogeneity of the leader sequences in different strains of MHV. ClutalW analysis on MHV JHM and MHV A59 strains is shown in (A) and 
(B), respectively. Each identified sgmRNA such as S, N and M was aligned with both the leader sequence and its respective genome sequence. 
Underlines indicate the overlapped TRS elements among the genome sequence of a specific viral gene, the leader sequence and its corresponding 
sgmRNA. Box indicates the TRS core element (CE). For each identified sgmRNA, the sizes of leader sequence and TRS were indicated at the right panel. 


4.3. The different patterns of leader alterations in 
MHV versus SARS-CoV genome 


Although the role of the leader sequence in 
regulating coronaviral transcription is well known, it is 
still unclear how sequence conservation at the 5’ end 
that coronavirus has during viral transmission. In order 
to address this issue directly, the leader sequences 
from seven MHV strains were isolated from GenBank: 
A59, JHM.IA, JHM, strain 1, strain 2, strain 3 and 
strain S, with access numbers NC_001846, FJ647226, 
NC_006852, FJ647223, AF201929, FJ647224 and 
GU593319, respectively. ClustalW analysis showed 
that four types of alterations were presented at the 5’ 
end of MHV genomes including 5’ deletion, internal 
deletion, nucleotide substitution and nucleotide 
insertion (Figure 4A). The heterogeneous uses of the 
pentanucleotide CE were observed in the leaders of 
MHV sgmRNAs (30). Our sequence analysis indicates 
that this heterogeneity in CE repeats might directly 
caused by the genomic alterations of MHVs in which 
some strains such as MHV-2, MHV-A59 and MHV-S 
only present two rather than three tandem repeats 
of the pentanucleotide CE (Figure 4A). This type 
of alteration does not affect the correct fusion site 
formation but reduces the leader size of the respective 
sgmRNAs (Figure 3). 


The next question that we asked was whether 
the leader alterations of SARS-CoV were similar to that 
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of MHV. To address this question directly, the leader 
sequences of 76 isolates of SARS-CoV were collected 
from GenBank. We only selected those leader 
sequences that had been marked as complete rather 
than partially sequenced SARS-CoV genomes from 
the NCBI database. Intriguingly, ClustalW alignment 
depicted in Figure 4B shows that SARS-CoVs might 
undergo “sequential deletions” at their 5’ ends of 
genomes in which some viral strains isolated later stage 
such as ShanghaiQXC1 and ShanghaiQXC2 (isolated 
on May 25, 2003, personnel communication with 
Dr. Zhenghong Yuan) completely lost their genomic 
leader sequences. In contrast to MHV, only three 
out of 76 SARS-CoV strains have single nucleotide 
substitution in their leader regions (Figure 6B). Thus, 
a lower substitution rates and a higher deletion rates 
might be associated with SARS-CoV leader sequences 
during viral passage in the host. 


5. DISCUSSION 


In this study, the leader sequences isolated 
from the complete genomes of both SARS-CoV and MHV 
were comparatively studied. The results indicate that 
leader alterations might be common during coronaviral 
infection. The extent of sequence identity between 
TRS-L and TRS-B in SARS-CoV and MHV sgmRNA 
varied markedly, but all contained a species specific core 
element (CE). The current data also show that sequence 
deletions and nucleotide substitutions were frequently 
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Figure 4. Sequence alignment reveals the patterns of the leader alterations of MHV and SARS-CoV. (A) The 5’ ends of genomic sequences were isolated 
from completely sequenced genomes of seven MHV strains (JHM-IA, MHV1, MHV2, MHV3, MHVS, A59 and JHM) and subjected to sequence alignment 
with ClutalW. The leader sequence is indicated by a line above the scale. CE repeats were boxed. (B) One-orientation and “sequential deletions” 
occurred at the 5’ ends of genomic leader sequences of SARS-CoV. The genomic leader sequences derived from the 76 isolates of the completely 
sequenced SARS-CoV genomes were aligned with ClustalW program. The shaded area indicates the identical nucleotides, while the core element is 
boxed. Dash lines represent the deleted nucleotides. The bold line marks the genomic leader sequence. 


observed in MHV genomes, while one orientation 
deletions at the 5’ end of leader sequences might be the 
characteristic feature of SARS-CoV isolates. 


Our study uncovered a novel sgmRNA 
named M-1 from SARS-CoV infected cells. M1 should 
be a real viral gene product since the minus strand 
replication intermediate could be readily detected 
(Figure 1B). Previous reports showed that the extent 
of sequence identity of TRS-L and TRS-Bs ranged 
from 7 to 18 nucleotides (31). However, current study 
revealed that the functional body-fusion junction can 
be as short as hexanucleotide (ACGAAC) showing 
in Figure 1B for M-1. Among the different species of 
coronaviruses, MHV might be able to generate the 
longest sequence identity (18 nt) between TRS-L and 
TRS-B. A pentanucleotide core element (UCUAA) 
could be repeated for three times at the 3’ leader 
sequence in some strains of MHV such as MHV JHM. 
Deleting one copy of this core element in the leader 
sequence of MHV A59 strain reduces the size of the 
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leader sequence but retains the sequence identity of 
TRS-L and TRS-B duplex. MHV JHM strain is believed 
to be more virulent than MHV A59 strain (32, 33). MHV 
JHM carrying three copies of CE can cause acute 
fetal encephalitis and chronic demyelination, whereas 
MHV A59 with two copies of CE only induces hepatitis, 
mild encephalitis and subacute demyelination. Since 
the TRS-L/TRS-B duplex formation could serve as a 
driving force for viral transcription (31), we speculate 
that reducing CE copy number may impair viral 
transcription and replication, and therefore eventually 
cause the reduced infectivity in some MHV strains. 


The “sequential deletions” at the 5’ genomic 
RNA of SARS-CoV may also generate a negative 
impact on the viral life cycle during viral transmission. 
In addition, earlier studies showed that the genomic 
RNAs isolated from patients in late epidemic stage 
presented larger pieces of genomic deletions (20, 34). 
We speculate that, when SARS-CoV enters the human 
body, it has to deal with the high pressure delivered by 
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the host immune system. This kind of pressure may 
be loaded onto the viral leader once a time during its 
replication or transmission, and eventually leads to its 
complete loss during late stage of epidemic. 
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